Buffering allows you to identify or define an area within a specified distance around a feature for spatial analysis or to indicate proximity or accessibility conditions. Buffering can be done on all three types of features: point, line, area. For example, you realize there is a correlation between long ambulance evacuation times and patient death rates. During the COVID-19 pandemic, patients with respiratory emergencies are particularly vulnerable to long evacuations. Thus, you would like to study how many African Americans in New York state live within a 5-mile radius of a hospital and create a map that assists both first responders and African Americans living in New York state in finding the nearest hospital with a 5-mile radius of a medical emergency location. You now have two shapefiles at hand, one is a New York state census tract shapefile with an attribute of African Americans population and the other is a shapefile of locations of all the hospitals in New York state. Let’s conduct this analysis using Python and the GeoPandas package. You can use the read_file() method from the geopandas package to read shapefiles and then convert the coordinate reference systems of the shapefiles to an appropriate one for the geography of your data in unit of metre by using the to_crs() method, and then create polygon buffers around features of interest using the buffer() method. Here is the data for this exercise.
If you already have Anaconda downloaded and installed, you can skip Part A and directly start the analysis in Part B. Make sure you also have packages pandas, geopandas of version 0.7 or higher, matplotlib, and descartes installed in the environment where you would like to conduct this analysis. Note starting with the version 0.7, geopandas made a big change in Coordinate Reference System representation and hence some code syntax differs before and after version 0.7, as described here and here. As our tutorial writes in new version syntax, please make sure your geopandas is of version 0.7 or later in order to run the code with no error. You may check the version of geopandas and upgrade it within your environment either in Anaconda Navigator or in your terminal window.
1) First, download Anaconda. Anaconda is a free and open-source distribution of Python. You can use Anaconda to install IDEs (integrated development environments where you can write and run code) and packages like Pandas and Geopandas. Go to the link to download Anaconda, https://www.anaconda.com/products/individual, and then open the .exe file that was downloaded and follow the instructions in the installation wizard prompt.
2) Once installation is complete, open Anaconda Navigator and create a new environment for your project. A Conda environment is a directory that contains a specific collection of Conda packages that you have installed. Conda has a default environment called 'base' that includes a Python installation and some core system libraries and dependencies of Conda. It is a “best practice” to avoid installing additional packages into your base environment, and, instead, create an isolated environment to manage packages and dependencies in a new project.
Click on the Environments selection in the left sidebar menu and then click on the 'Create' at the bottom. This will open a dialog box prompting you to create a name for the new environment. You can give any name for your new environment. Here, we use 'GIS_in_Python' as the environment name. Then click the 'Create' button within the dialog box to finish the creation.
3) Once you have your project environment set up, click on the arrow to the right of your new environment, 'GIS_in_Python' in this example, and select Open Terminal. This will give you access to the command line interface on your computer in a window.
4) Install the packages/libraries necessary for the analysis by entering the following commands in the opened terminal, one line at a time:
conda install pandas
conda install geopandas
conda install matplotlib
conda install descartes
5) Once you have those libraries all installed, select the new environment, 'GIS_in_Python' in this example, in the 'Applications on' dropdown menu, and then click "install" and "launch" under Jupyter Notebook. Jupyter Notebook will open in your web browser (it does not require the internet to work).
6) In Jupyter Notebook, navigate to the folder where you saved the code file you plan to use and open the .ipynb file (the extension for Jupyter Notebook files written in Python) to run it in the Notebook. If you would like to create a new .ipynb file, browse to the folder in which you would like to save your Notebook, then click the "New" dropdown button on the top-right and select "Python 3". Your new Notebook will open in a new tab in your browser. If you want to create a new directory using the Jupyter Notebook dashboard, click the "New" dropdown button and then select "Folder". To add files from your local machine, click the "Upload" button on the top-right to open a file chooser window and then choose the file you wish to upload.
1) Import necessary packages/libraries.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import geopandas as gpd
import descartes
2) Use the gpd.read_file() function from the geopandas package to read the shapefile. Optionally, you can use the head() method to return the first 5 rows of the GeoDataFrame, and use the .shape attribute to check the number of rows and columns of the GeoDataFrame in the returned tuple (number of rows, number of columns). For this example, the number of rows of the 'NYS_hospitals' and the 'NYS_population' suggest that there are 234 hospitals and 4918 census tracts in New York state.
NYS_hospitals = gpd.read_file("NYS_hospitals.shp")
NYS_hospitals.head()
| Name | Address_1 | City | State | Latitude | Longitude | Zip | geometry | |
|---|---|---|---|---|---|---|---|---|
| 0 | St Josephs Hospital Health Center | 301 Prospect Avenue | Syracuse | NY | 43.055726 | -76.149375 | 13203.0 | POINT (-76.14938 43.05573) |
| 1 | St James Mercy Hospital - Mercycare | 1 Bethesda Drive | North Hornell | NY | 42.346258 | -77.660415 | 14843.0 | POINT (-77.66041 42.34626) |
| 2 | Arnot Ogden Medical Center | 600 Roe Avenue | Elmira | NY | 42.099279 | -76.826766 | 14905.0 | POINT (-76.82677 42.09928) |
| 3 | Millard Fillmore Hospital | 3 Gates Circle | Buffalo | NY | 42.920331 | -78.867515 | 14209.0 | POINT (-78.86751 42.92033) |
| 4 | Schuyler Hospital | 220 Steuben Street | Montour Falls | NY | 42.351541 | -76.858602 | 14865.0 | POINT (-76.85860 42.35154) |
NYS_population = gpd.read_file("NYS_population.shp")
NYS_population.head()
| STATEFP | COUNTYFP | TRACTCE | GEOID | NAME | NAMELSAD | MTFCC | FUNCSTAT | ALAND | AWATER | ... | INTPTLON | Total | White | Blk/AfA | AmIn/AN | Asian | NHw/PaI | Other | AreaName | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 36 | 093 | 033102 | 36093033102 | 331.02 | Census Tract 331.02 | G5020 | S | 183329987.0 | 3232878.0 | ... | -074.1727018 | 6386 | 6293 | 48 | 0 | 13 | 0 | 32 | Census Tract 331.02, Schenectady County, New York | POLYGON ((-74.30655 42.75064, -74.30624 42.751... |
| 1 | 36 | 093 | 020102 | 36093020102 | 201.02 | Census Tract 201.02 | G5020 | S | 854904.0 | 0.0 | ... | -073.9161408 | 1926 | 1189 | 524 | 0 | 159 | 0 | 54 | Census Tract 201.02, Schenectady County, New York | POLYGON ((-73.92513 42.82029, -73.92440 42.821... |
| 2 | 36 | 093 | 032504 | 36093032504 | 325.04 | Census Tract 325.04 | G5020 | S | 66000589.0 | 1265253.0 | ... | -074.0288926 | 2269 | 2236 | 0 | 0 | 23 | 0 | 10 | Census Tract 325.04, Schenectady County, New York | POLYGON ((-74.09298 42.95587, -74.08805 42.955... |
| 3 | 36 | 081 | 003400 | 36081003400 | 34 | Census Tract 34 | G5020 | S | 213987.0 | 0.0 | ... | -073.8581871 | 2345 | 862 | 179 | 0 | 504 | 0 | 800 | Census Tract 34, Queens County, New York | POLYGON ((-73.86629 40.68319, -73.86549 40.683... |
| 4 | 36 | 081 | 003900 | 36081003900 | 39 | Census Tract 39 | G5020 | S | 146970.0 | 0.0 | ... | -073.9409316 | 1310 | 764 | 94 | 11 | 140 | 0 | 301 | Census Tract 39, Queens County, New York | POLYGON ((-73.94330 40.76066, -73.94247 40.761... |
5 rows × 21 columns
Let’s look at the shape of the shapefiles for 'NYS_hospitals' and 'NYS_populuations'.
print(NYS_hospitals.shape)
print(NYS_population.shape)
(234, 8) (4918, 21)
You may also use matplotlib for plotting to generate an overview of your GeoDataFrame.
plt.subplots(), "fig" short for figure, and "ax" short for axes.figsize = (12,12), the first number corresponding to width, the X axis, and the second corresponding to height, the Y axis. ax = ax sets axes on which to draw the plot.
fig, ax = plt.subplots(figsize=(12, 12))
NYS_hospitals.plot(ax=ax)
<AxesSubplot:>
fig, ax = plt.subplots(figsize=(12, 12))
NYS_population.plot(ax=ax)
<AxesSubplot:>
3) Before buffering, use the .crs attribute to check the current Coordinate Reference System (CRS)/projection of your spatial data, and if they are not projected into a projection using unit of metre, use the to_crs() method to re-project the data to a projection appropriate for the geographical area of your data and based in metre. This step ensures that you will be able to create buffers in distance unit. Also, make sure that all layers involved in the analysis have the same projection as it makes it possible to analyze the spatial relationships between layers.
In this example, 'NYS_hospitals' and 'NYS_population' are the two GeoDataFrame that we need to examine their projection (.crs). Both 'NYS_hospitals' and 'NYS_population' are in EPSG:4269 of which CRS is in latitude and longitude in decimal degrees, so we need to re-project the two layers. A bit of research suggests that EPSG:3627 would be an appropriate projection for New York state in the unit of metre. Therefore, we convert the CRS of 'NYS_hospitals' and 'NYS_population' to EPSG:3627 (.to_crs('epsg:3627')) so that next we can create buffers in metres.
NYS_hospitals.crs
<Geographic 2D CRS: EPSG:4269> Name: NAD83 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands. British Virgin Islands. - bounds: (167.65071105957, 14.928194130078, -47.743430543984, 86.453745196084) Datum: North American Datum 1983 - Ellipsoid: GRS 1980 - Prime Meridian: Greenwich
NYS_population.crs
<Geographic 2D CRS: EPSG:4269> Name: NAD83 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands. British Virgin Islands. - bounds: (167.65071105957, 14.928194130078, -47.743430543984, 86.453745196084) Datum: North American Datum 1983 - Ellipsoid: GRS 1980 - Prime Meridian: Greenwich
NYS_hospitals = NYS_hospitals.to_crs('epsg:3627')
NYS_population = NYS_population.to_crs('epsg:3627')
4) Use the buffer() method to perform buffering. The buffer() method takes in a buffer distance and returns a GeoSeries of geometries with boundaries the buffer distance away from the input geometries. For this example, the geometries are points in the 'NYS_hospitals' (NYS_hospitals['geometry']) which are the locations of hospitals, and we create a 8km (around 5 miles) radius buffer around the hospital points (.buffer(8000)). Optionally, you can use the head() method to return the first 5 elements of the GeoSeries.
buf = NYS_hospitals['geometry'].buffer(8000)
buf.head()
0 POLYGON ((132784.007 323086.321, 132745.485 32... 1 POLYGON ((6367.558 248385.524, 6329.036 247601... 2 POLYGON ((74150.726 218414.338, 74112.203 2176... 3 POLYGON ((-89469.017 316930.646, -89507.539 31... 4 POLYGON ((72432.475 246513.537, 72393.953 2457... dtype: geometry
5) You can now use matplotlib to generate an overview of the buffers. For this example, based on the map of 'NYS_population' we generated in the Step 2, further specify that:
Ideally, we want the filled color of the census tracts to be white (facecolor='white'), and the color of the border lines to be black (edgecolor='black');
We also want to plot the buffers on the same axes (ax=ax) with a bit transparent color (alpha= 0.5);
fig, ax = plt.subplots(figsize=(12, 12))
NYS_population.plot(ax=ax, facecolor='white', edgecolor='black')
buf.plot(ax=ax, alpha= 0.5)
NYS_hospitals.plot(ax=ax, color="red", markersize=5)
<AxesSubplot:>
6) Now with the buffers generated, we can study the question of how many African American in New York state live within 5 miles to hospitals through the following steps.
We can use the intersects() method to check if a census tract (the $i_{th}$ element in the 'NYS_population['geometry']', i.e. 'NYS_population['geometry'][i]') intersects with ('.intersects') each of the buffers ('buf'). The intersects() method returns a Series of True or False, indicating if a census tract is within the 5 miles buffers of each of the 234 hospitals in New York state, saved in a new variable 'tract_intersects_buffers'.
Use the sum() method to get the sum of the values for the Series of True or False, given that Booleans are considered a numeric type in Python and arithmetic operations can be applied to Booleans, with True equal to 1 and False equal to 0. If a census tract intersects with any of the buffers, or in another word, if there is at least one True value in the Series, the result of 'tract_intersects_buffers.sum()' will be greater than 0, and then we append the index of the census tract to the list 'idx_tracts_close2hospital' ('idx_tracts_close2hospital.append(i)') which is empty upon creation ('idx_tracts_close2hospital = [ ]'). If a census tract doesn't intersect with any of the buffers, or in another word, if there is no True value in the Series, the result of 'tract_intersects_buffers.sum()' will be 0, and we don't save the index of the census tract in the list 'idx_tracts_close2hospital'.
intersects() method works in a row-wise manner. It does not check if an element of one GeoSeries crosses any element of the other one.
idx_tracts_close2hospital = []
for i in range(1,len(NYS_population.index)):
tract_intersects_buffers = buf.intersects(NYS_population['geometry'][i])
if(tract_intersects_buffers.sum() > 0):
idx_tracts_close2hospital.append(i)
Optionally, we can use len() function to check the length of the list 'idx_tracts_close2hospital' and it shows that the number of census tracts that are close to at least one hospital within 5 miles is 4400.
print(len(idx_tracts_close2hospital))
4400
To have a look at the buffers around hospitals and the tracts that have at least one intersection with the buffers, we can again use matplotlib to do the visualization. Based on the map we generated in the Step 5, we add on top of it the following specifications.
Add a layer of census tracts that are within 5 miles to hospitals. We use the iloc() method to subset these census tracts from the complete dataset, 'NYS_population', by their integer-location based index ( NYS_population.iloc[idx_tracts_close2hospital, : ] ). Using square brackets can slice the GeoDataFrame, and the colon character (' : ') tells GeoPandas that we want to retrieve all columns. We also specify that we want to plot the census tracts on the same axes (ax=ax) with yellow color (color='yellow'). Since layers are stacked from bottom to top in the same order of the corresponding calls to the plot function in matplotlib, in order to show the buffer areas and hospital points clearly on the map, we add the new layer before them.
Use legend() method to place a legend on the axes (ax.legend( )). To explicitly define the elements in the legend, we use 'handles' argument to specify that the two patches, 'yellow_patch' and 'white_patch', are added to the legend (handles=[yellow_patch, white_patch]), and set the location of the legend to be lower left on the plot (loc = 'lower left'), the font size of the legend to be 10 (fontsize = 10), and the title of the legend be "Close to Hospital?" (title='Close to Hospital?'). The two patches, 'yellow_patch' and 'white_patch', are created by the mpatches.Patch() class. The patch that have face color of yellow (facecolor='yellow') and edge color of black (edgecolor = 'black') and label of 'Yes' (label='Yes') is to indicate that census tracts that can access hospitals within 5 miles are displayed in yellow and the patch that have face color of white and edge color of black and label of 'No' is to indicate the census tracts that don't have access to hospitals within 5 miles are displayed in white.
set_title() method to add a title for the axes. The title is set to be in bold (fontweight='bold') and size 15 (size=15).yellow_patch = mpatches.Patch(facecolor='yellow', edgecolor = 'black', label='Yes')
white_patch = mpatches.Patch(facecolor='white', edgecolor = 'black', label='No')
fig, ax = plt.subplots(figsize=(12, 12))
NYS_population.plot(ax=ax, color='white', edgecolor='black')
NYS_population.iloc[idx_tracts_close2hospital,:].plot(ax=ax, color='yellow')
buf.plot(ax=ax, alpha=.5)
NYS_hospitals.plot(ax=ax, color='red', markersize=5)
ax.legend(handles=[yellow_patch, white_patch], loc = 'lower left',fontsize = 10, title='Close to Hospital?')
ax.set_title('New York census tracts that fall within 5 miles of hospitals', fontweight='bold', size=15)
Text(0.5, 1.0, 'New York census tracts that fall within 5 miles of hospitals')
Now let’s calculate how many African Americans in New York state live within 5 miles to hospitals. Remember the variable 'idx_tracts_close2market' saves the indexes of tracts that intersect with at least one buffer of hospitals. We use the indexes to get the population of African Americans who live in each of these tracts (NYS_population.iloc[idx_tracts_close2hospital, : ]['Blk/AfA']), then use astype() method to convert the data type to be integer and then use sum() method to calculate the sum of the populations. The value is saved in a new variable, 'blacks_within_5miles', which is then divided by the total number of African Americans in all 4918 tracts.
blacks_within_5miles = NYS_population.iloc[idx_tracts_close2hospital,:]['Blk/AfA'].astype('int64').sum()
blacks_within_5miles/NYS_population['Blk/AfA'].astype('int64').sum()
0.9795718178381071
In conclusion, the calculation result shows that 97.96% African Americans in New York state live within 5 miles distance to at least one hospital.